User Level DMA without Operating System Kernel Modi cation
نویسندگان
چکیده
Direct Memory Access DMA is frequently used to transfer data between the main memory of a host computer and the interconnection network in order to free the host processor from the burden of the trans fer DMA operations are traditionally initiated by the operating system kernel mainly to prevent one appli cation from tampering with another applications data Recent architecture trends suggest that interconnection networks get faster while operating systems get slower compared to processor speeds These trends imply that the initiation of a DMA operation becomes slower due to operating system involvement while the DMA data transfer itself becomes faster with time Soon the operating system overhead associated with starting a DMA will be larger than the data transfer itself esp for small data transfers This paper proposes several algorithms that allow user level applications to start DMA operating with out the involvement of the operating system Our al gorithms allow user applications to have direct but controlled access to the DMA engine registers Low overhead user level DMA is achieved without compro mising protection and without requiring changes to the underlying operating system kernel Using our pro posed algorithms a DMA operation can be initiated in to assembly instructions By comparison operat ing system based initiation of DMA requires thousands of assembly instructions The authors are also with the University of Crete Copyright IEEE Published in the Proceedings of the THird International Symposium on High Performance Com puter Architecture February in San Antonio Texas USA Personal use of this material is permitted However per mission to reprint republish this material for advertising or pro motional purposes or for creating new collectiveworks for resale or redistribution to servers or lists or to reuse any copyrighted component of this work in other works must be obtained from the IEEE Contact Manager Copyrights and Permissions IEEE Service Center Hoes Lane P O Box Piscat away NJ USA Telephone Intl Introduction Popular contemporary computing environments are comprised of powerful workstations connected via a network which in many cases has a high throughput resulting in systems called workstation clusters or Networks of Workstations NOWs The availabil ity of such computing and communication power gives rise to new applications like multimedia high perfor mance scienti c computing real time applications en gineering design and simulation and so on Up to re cently only high performance parallel processors and supercomputers were able to satisfy the computing requirements of these applications Fortunately the development of superscalar RISC processors increases the computing ability of modern workstations and mi crocomputers signi cantly At the same time recent improvement in high speed link technology has lead to the development of communication networks that sustain bandwidth in the order of Gigabits per sec ond Gbps To allow fast processors to make e cient use of all the available bandwidth several user level memory mapped network interfaces have been devel oped and manufactured Most of these interfaces use Direct Memory Access DMA op erations to transfer data from one workstation to an other DMA has been heavily used to transfer data be tween fast main memory and slow magnetic disks to free the host processor from the burden of transfer ring the data itself DMA management has been traditionally done by the Operating System kernel The Operating System is the only trusted entity that is allowed to access DMA registers User applications are not allowed to initiate DMA operations by themselves There are two reasons for the necessity of the Operating System in volvement in starting a DMA operation in traditional systems Atomicity To start a DMA operation the soft ware should pass several arguments to a DMA engine At least three arguments are needed the source address the destination address and the size of the DMA transfer All these argu ments should be given to the DMA engine atomi cally otherwise two processes that want to initi ate two DMA operations at about the same time may overwrite each other s arguments in their at tempt to grab the DMA engine To resolve such race conditions the processes invoke the operat ing system which runs uninterrupted starts the DMA operation of the rst process and when n ished starts the DMA of the second process Protection from programming errors and mali cious users Most DMA engines accept only phys ical addresses as the source and destination ad dress of a DMA operation Ordinary users should not be allowed to pass physical addresses to a DMA engine since they may pass physical ad dresses that they are not allowed to access Thus an ignorant or malicious user may start a DMA operation from to memory addresses that s he normally has no access to As a result s he may read private data destroy the operating system or crash the computer The only trustworthy en tity to determine which user is allowed to access which physical addresses is the operating system In the previous decades since the overhead of the operating system involvement in the initiation of a DMA was small compared to the DMA data transfer itself no attempt was made to allow user applications to start DMA operations However in contemporary fast local area networks starting a DMA operation from inside the operating system kernel may take more than the network transfer operation itself For this reason several researchers have started to address the problem of letting user applications initiate a DMA Pioneering work in the SHRIMP and FLASH projects have pinpointed the importance of user level DMA operations and have proposed initial solutions to user level DMA Unfortunately these approaches to user level DMA require modi cations to the op erating system kernel To function correctly both mentioned approaches modify the operating system context switch handler in order to enforce atomicity of user level DMA operations and avoid race condi tions The SHRIMP approach requires that the con text switch handler aborts all half started DMA op erations so that no race condition may happen while the FLASH approach requires that the context switch handler informs the DMA engine about the identity of the running process at context switch time so that the DMA engine has enough information to avoid race conditions Although a few lines of code to the context switch handler seem a trivial change they may turn out to be a major obstacle to the success of user level DMA for the following reasons Modi cations of the operating system kernel may not be possible because the source code of the operating system may be con dential or sold un der a license only In either case ordinary users may not be able or willing to acquire operating system sources Even if the changes to the con text switch handler are distributed as an operat ing system patch they may generate even more problems Distributing changes for user level DMA to existing operating system as patches sets a bad example If all peripheral device vendors start distributing patches to existing operat ing systems di erent patches will eventually con ict with each other leading to erroneous code Patches are di cult to maintain They force the vendor of the DMA device to produce a new patch for each new version of the oper ating system The context switch handler is usually on the crit ical path of the performance of the operating sys tem If each manufacturer of each device adds a few lines of code to the context switch handler the Operating System performance would be sig ni cantly lower In this paper we propose several solutions to the user level DMA problem that require no modi cations to the operating system kernel Two of them are novel and the other two are elaborations of our older designs Our methods allow user applications to securely and atomically start DMA operations from user level with out needing to change the operating system kernel User Level DMA Early Work
منابع مشابه
User-Level DMA without Operating System Kernel Modification
Direct Memory Access (DMA) is frequently used to transfer data between the main memory of a host computer and the interconnection network, in order to free the host processor from the burden of the transfer. DMA operations are traditionally initiated by the operating system kernel, mainly to prevent one application from tampering with another applications' data. Recent architecture trends sugge...
متن کاملProtected, User-Level DMA for the SHRIMP Network Interface
Traditional DMA requires the operating system to perform many tasks to initiate a transfer, with overhead on the order of hundreds or thousands of CPU instructions. This paper describes a mechanism, called User-level Direct Memory Access (UDMA), for initiating DMA transfers of input/output data, with full protection, at a cost of only two user-level memory references. The UDMA mechanism uses ex...
متن کاملPEXOR Linux device driver and DABC integration
A device driver has been developed to apply the PEXOR hardware for Linux OS. It is realized as a char driver kernel module, currently for kernel versions 2.6.27. The driver operations for read() and write() implement PIO to the PEXOR on-board memory. Operation mmap() allocates kernel buffers for DMA operations and maps these to user space addresses. The pexor kernel module manages these buffers...
متن کاملNetwork Interface Support for User-Level Buffer Management
The network interfaces of existing multicomputers and workstations require a signi cant amount of software overhead to provide protection and bu er management in order to implement messagepassing protocols. This paper advocates a physical memory mapping method in a network interface design that supports user-level bu er management. The method requires only a minimal addition to the traditional ...
متن کاملRbCl: A Re ective Object-Oriented Concurrent Language without a Run-time Kernel
We propose a re ective object-oriented concurrent language RbCl which has no run-time kernel. That is to say, all the behavior of RbCl except for what is restricted by the operating system and hardware can be modi ed/extended by the user. RbCl runs e ciently in a distributed environment and is intended for practical use. The execution of an RbCl program is performed by a metasystem that consist...
متن کامل